Датафрейм Telecom Churn

Первые 10 строк

State Account length Area code International plan Voice mail plan Number vmail messages Total day minutes Total day calls Total day charge Total eve minutes Total eve calls Total eve charge Total night minutes Total night calls Total night charge Total intl minutes Total intl calls Total intl charge Customer service calls Churn Total calls
0 KS 128 415 No Yes 25 265.1 110 45.07 197.4 99 16.78 244.7 91 11.01 10.0 3 2.70 1 False 300
1 OH 107 415 No Yes 26 161.6 123 27.47 195.5 103 16.62 254.4 103 11.45 13.7 3 3.70 1 False 329
2 NJ 137 415 No No 0 243.4 114 41.38 121.2 110 10.30 162.6 104 7.32 12.2 5 3.29 0 False 328
3 OH 84 408 Yes No 0 299.4 71 50.90 61.9 88 5.26 196.9 89 8.86 6.6 7 1.78 2 False 248
4 OK 75 415 Yes No 0 166.7 113 28.34 148.3 122 12.61 186.9 121 8.41 10.1 3 2.73 3 False 356
5 AL 118 510 Yes No 0 223.4 98 37.98 220.6 101 18.75 203.9 118 9.18 6.3 6 1.70 0 False 317
6 MA 121 510 No Yes 24 218.2 88 37.09 348.5 108 29.62 212.6 118 9.57 7.5 7 2.03 3 False 314
7 MO 147 415 Yes No 0 157.0 79 26.69 103.1 94 8.76 211.8 96 9.53 7.1 6 1.92 0 False 269
8 LA 117 408 No No 0 184.5 97 31.37 351.6 80 29.89 215.8 90 9.71 8.7 4 2.35 1 False 267
9 WV 141 415 Yes Yes 37 258.6 84 43.96 222.0 111 18.87 326.4 97 14.69 11.2 5 3.02 0 False 292

Графики

Анализ датафрейма

Модель, предсказывающая признак Churn методом случайного леса

Total eve callsTotal night minutesNumber vmail messagesCustomer service callsTotal intl chargeTotal day minutesTotal eve minutesTotal intl minutesTotal intl callsTotal day chargeTotal night chargeTotal eve chargeTotal night callsTotal day calls0.040.060.080.10.120.140.16

Круговая диаграмма признака Churn

Гистограмма наличия международного плана среди ушедших клиентов

346137NoYes050100150200250300350
International plans of Churned ClientsInternational planChurn

Гистограмма наличия международного плана среди оставшихся клиентов

2664186NoYes05001000150020002500
International plans of not Churned ClientsInternational planChurn

Модели отсортированы по баллам F1, поскольку точность и полнота важны для оценки.
Перекрестная проверка производится с 5-кратным повторением.

index Model Accuracy AUC Recall Prec. F1 Kappa MCC TT(Sec)
lightgbm Light Gradient Boosting Machine 0.9017 0.8693 0.5609 0.6741 0.6087 0.5534 0.5585 0.978
gbc Gradient Boosting Classifier 0.8761 0.8511 0.6632 0.5409 0.5953 0.5232 0.5273 4.866
rf Random Forest Classifier 0.8867 0.855 0.4688 0.6163 0.5315 0.4684 0.4746 1.526
lr Logistic Regression 0.8119 0.8338 0.729 0.4019 0.5165 0.4126 0.4416 0.072
svm SVM - Linear Kernel 0.8119 0.0 0.7069 0.3985 0.5087 0.4041 0.4298 0.066

Bagged модель

index Accuracy AUC Recall Prec. F1 Kappa MCC
0 0.8997 0.8406 0.6667 0.6207 0.6429 0.5846 0.5851
1 0.9223 0.8914 0.6545 0.75 0.699 0.6547 0.6567
2 0.8847 0.9076 0.5636 0.5849 0.5741 0.5074 0.5076
3 0.8847 0.8458 0.5455 0.5882 0.566 0.4997 0.5001
4 0.8869 0.8774 0.6481 0.5738 0.6087 0.5429 0.5443
Mean 0.8957 0.8726 0.6157 0.6235 0.6181 0.5579 0.5588
SD 0.0144 0.0259 0.0506 0.0651 0.0488 0.057 0.0575

Boosted модель

index Accuracy AUC Recall Prec. F1 Kappa MCC
0 0.8922 0.8294 0.5926 0.6038 0.5981 0.5359 0.5359
1 0.9198 0.8953 0.6 0.7674 0.6735 0.6285 0.6347
2 0.8872 0.8937 0.5273 0.6042 0.5631 0.4987 0.5002
3 0.8922 0.8456 0.5091 0.6364 0.5657 0.505 0.5091
4 0.892 0.8874 0.6296 0.5965 0.6126 0.5499 0.5502
Mean 0.8967 0.8703 0.5717 0.6416 0.6026 0.5436 0.546
SD 0.0117 0.0274 0.0458 0.0644 0.0402 0.0465 0.0478

Blended модель

index Accuracy AUC Recall Prec. F1 Kappa MCC
0 0.8972 0.8348 0.6296 0.6182 0.6239 0.5644 0.5644
1 0.9173 0.891 0.5818 0.7619 0.6598 0.6137 0.6209
2 0.8897 0.9023 0.4909 0.6279 0.551 0.4892 0.4941
3 0.8972 0.8477 0.5273 0.6591 0.5859 0.528 0.5323
4 0.892 0.8888 0.6111 0.6 0.6055 0.5429 0.543
Mean 0.8987 0.8729 0.5681 0.6534 0.6052 0.5476 0.5509
SD 0.0098 0.0266 0.0519 0.0575 0.0364 0.0411 0.0418

Best модель

index Parameters
algorithm SAMME.R
base_estimator LGBMClassifier(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0, importance_type='split', learning_rate=0.1, max_depth=-1, min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0, n_estimators=100, n_jobs=-1, num_leaves=31, objective=None, random_state=142, reg_alpha=0.0, reg_lambda=0.0, silent='warn', subsample=1.0, subsample_for_bin=200000, subsample_freq=0)
learning_rate 1.0
n_estimators 10
random_state 142

ROC-кривая

Кривая точного отклика

Матрица путаницы

AdaBoostClassifier

Прогноз на тестовых данных

index Model Accuracy AUC Recall Prec. F1 Kappa MCC
0 Light Gradient Boosting Machine 0.9245 0.9048 0.7054 0.7521 0.728 0.6842 0.6847

Прогнозирование невидимых данных

Accuracy - 0.846847 AUC - 0.5 Recall - 0.0 Precision - 0.0 F1 Score - 0.0